智能论文笔记

Detect-and-Segment: a Deep Learning Approach to Automate Wound Image Segmentation

Gaetano Scebba , Jia Zhang , Sabrina Catanzaro , Carina Mihai , Oliver Distler , Martin Berli , Walter Karlen

分类：计算机视觉

2021-11-02

慢性伤口显着影响生活质量。如果没有正确管理，他们可能会严重恶化。基于图像的伤口分析可以通过量化与愈合相关的重要特征来客观地评估伤口状态。然而，伤口类型，图像背景组成和捕获条件的高异质性挑战伤口图像的鲁棒分割。我们呈现了检测和段（DS），深度学习方法，以产生具有高泛化能力的伤口分割图。在我们的方法中，专门的深度神经网络检测到伤口位置，从未经信息背景隔离伤口，并计算伤口分割图。我们使用具有糖尿病脚溃疡图像的一个数据集评估了这种方法。为了进一步测试，使用4个补充独立数据组，具有来自不同体积的较大种类的伤口类型。当以相同的方法组合检测和分割时，在将完整图像上的分割到0.85时，Matthews的相关系数（MCC）从0.29提高到0.29。当从补充数据集汲取的卷绕图像上进行测试时，DS方法将平均MCC从0.17增加到0.85。此外，DS方法使得分段模型的培训能够在保持分割性能的同时培训高达90％的训练数据。

translated by 谷歌翻译

All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks

Carina Geldhauser , Johan Liljegren , Pontus Nordqvist

分类：计算机视觉 | (统计)机器学习

2022-12-28

We test the performance of GAN models for lip-synchronization. For this, we reimplement LipGAN in Pytorch, train it on the dataset GRID and compare it to our own variation, L1WGAN-GP, adapted to the LipGAN architecture and also trained on GRID.

translated by 谷歌翻译

Learning efficient backprojections across cortical hierarchies in real time

Kevin Max , Laura Kriener , Garibaldi Pineda García , Thomas Nowotny , Walter Senn , Mihai A. Petrovici

分类：机器学习 | 神经与进化计算

2022-12-20

Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding.

translated by 谷歌翻译

HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

Andrei Zanfir , Mihai Zanfir , Alexander Gorban , Jingwei Ji , Yin Zhou , Dragomir Anguelov , Cristian Sminchisescu

分类：计算机视觉

2022-12-15

Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.

translated by 谷歌翻译

PhoMoH: Implicit Photorealistic 3D Models of Human Heads

Mihai Zanfir , Thiemo Alldieck , Cristian Sminchisescu

分类：计算机视觉

2022-12-14

We present PhoMoH, a neural network methodology to construct generative models of photorealistic 3D geometry and appearance of human heads including hair, beards, clothing and accessories. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photorealistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and allow the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.

translated by 谷歌翻译

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

Enric Corona , Mihai Zanfir , Thiemo Alldieck , Eduard Gabriel Bazavan , Andrei Zanfir , Cristian Sminchisescu

分类：计算机视觉

2022-12-13

We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.

translated by 谷歌翻译

Overview of The MediaEval 2022 Predicting Video Memorability Task

Lorin Sweeney , Mihai Gabriel Constantin , Claire-Hélène Demarty , Camilo Fosco , Alba G. Seco de Herrera , Sebastian Halder , Graham Healy , Bogdan Ionescu , Ana Matran-Fernandez , Alan F. Smeaton

分类：计算机视觉 | 人工智能

2022-12-13

This paper describes the 5th edition of the Predicting Video Memorability Task as part of MediaEval2022. This year we have reorganised and simplified the task in order to lubricate a greater depth of inquiry. Similar to last year, two datasets are provided in order to facilitate generalisation, however, this year we have replaced the TRECVid2019 Video-to-Text dataset with the VideoMem dataset in order to remedy underlying data quality issues, and to prioritise short-term memorability prediction by elevating the Memento10k dataset as the primary dataset. Additionally, a fully fledged electroencephalography (EEG)-based prediction sub-task is introduced. In this paper, we outline the core facets of the task and its constituent sub-tasks; describing the datasets, evaluation metrics, and requirements for participant submissions.

translated by 谷歌翻译

Experiences from the MediaEval Predicting Media Memorability Task

Alba García Deco de Herrera , Mihai Gabriel Constantin , Chaire-Hélène Demarty , Camilo Fosco , Sebastian Halder , Graham Healy , Bogdan Ionescu , Ana Matran-Fernandez , Alan F. Smeaton , Mushfika Sultana

分类：计算机视觉 | 人工智能

2022-12-07

The Predicting Media Memorability task in the MediaEval evaluation campaign has been running annually since 2018 and several different tasks and data sets have been used in this time. This has allowed us to compare the performance of many memorability prediction techniques on the same data and in a reproducible way and to refine and improve on those techniques. The resources created to compute media memorability are now being used by researchers well beyond the actual evaluation campaign. In this paper we present a summary of the task, including the collective lessons we have learned for the research community.

translated by 谷歌翻译

Event knowledge in large language models: the gap between the impossible and the unlikely

Carina Kauf , Anna A. Ivanova , Giulia Rambelli , Emmanuele Chersoni , Jingyuan S. She , Zawad Chowdhury , Evelina Fedorenko , Alessandro Lenci

分类：自然语言处理 | 人工智能

2022-12-02

People constantly use language to learn about the world. Computational linguists have capitalized on this fact to build large language models (LLMs) that acquire co-occurrence-based knowledge from language corpora. LLMs achieve impressive performance on many tasks, but the robustness of their world knowledge has been questioned. Here, we ask: do LLMs acquire generalized knowledge about real-world events? Using curated sets of minimal sentence pairs (n=1215), we tested whether LLMs are more likely to generate plausible event descriptions compared to their implausible counterparts. We found that LLMs systematically distinguish possible and impossible events (The teacher bought the laptop vs. The laptop bought the teacher) but fall short of human performance when distinguishing likely and unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLMs generalize well across syntactic sentence variants (active vs passive) but less well across semantic sentence variants (synonymous sentences), (iii) some, but not all LLM deviations from ground-truth labels align with crowdsourced human judgments, and (iv) explicit event plausibility information emerges in middle LLM layers and remains high thereafter. Overall, our analyses reveal a gap in LLMs' event knowledge, highlighting their limitations as generalized knowledge bases. We conclude by speculating that the differential performance on impossible vs. unlikely events is not a temporary setback but an inherent property of LLMs, reflecting a fundamental difference between linguistic knowledge and world knowledge in intelligent systems.

translated by 谷歌翻译

Symphony in the Latent Space: Provably Integrating High-dimensional Techniques with Non-linear Machine Learning Models

Qiong Wu , Jian Li , Zhenming Liu , Yanhua Li , Mihai Cucuringu

分类：机器学习

2022-12-01

This paper revisits building machine learning algorithms that involve interactions between entities, such as those between financial assets in an actively managed portfolio, or interactions between users in a social network. Our goal is to forecast the future evolution of ensembles of multivariate time series in such applications (e.g., the future return of a financial asset or the future popularity of a Twitter account). Designing ML algorithms for such systems requires addressing the challenges of high-dimensional interactions and non-linearity. Existing approaches usually adopt an ad-hoc approach to integrating high-dimensional techniques into non-linear models and recent studies have shown these approaches have questionable efficacy in time-evolving interacting systems. To this end, we propose a novel framework, which we dub as the additive influence model. Under our modeling assumption, we show that it is possible to decouple the learning of high-dimensional interactions from the learning of non-linear feature interactions. To learn the high-dimensional interactions, we leverage kernel-based techniques, with provable guarantees, to embed the entities in a low-dimensional latent space. To learn the non-linear feature-response interactions, we generalize prominent machine learning techniques, including designing a new statistically sound non-parametric method and an ensemble learning algorithm optimized for vector regressions. Extensive experiments on two common applications demonstrate that our new algorithms deliver significantly stronger forecasting power compared to standard and recently proposed methods.

translated by 谷歌翻译